Detecting hidden batch factors through data-adaptive adjustment for biological effects.
نویسندگان
چکیده
Motivation Batch effects are one of the major source of technical variations that affect the measurements in high-throughput studies such as RNA sequencing. It has been well established that batch effects can be caused by different experimental platforms, laboratory conditions, different sources of samples and personnel differences. These differences can confound the outcomes of interest and lead to spurious results. A critical input for batch correction algorithms is the knowledge of batch factors, which in many cases are unknown or inaccurate. Hence, the primary motivation of our paper is to detect hidden batch factors that can be used in standard techniques to accurately capture the relationship between gene expression and other modeled variables of interest. Results We introduce a new algorithm based on data-adaptive shrinkage and semi-Non-negative Matrix Factorization for the detection of unknown batch effects. We test our algorithm on three different datasets: (i) Sequencing Quality Control, (ii) Topotecan RNA-Seq and (iii) Single-cell RNA sequencing (scRNA-Seq) on Glioblastoma Multiforme. We have demonstrated a superior performance in identifying hidden batch effects as compared to existing algorithms for batch detection in all three datasets. In the Topotecan study, we were able to identify a new batch factor that has been missed by the original study, leading to under-representation of differentially expressed genes. For scRNA-Seq, we demonstrated the power of our method in detecting subtle batch effects. Availability and implementation DASC R package is available via Bioconductor or at https://github.com/zhanglabNKU/DASC. Contact [email protected] or [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.
منابع مشابه
Microarray-Based RNA Profiling of Breast Cancer: Batch Effect Removal Improves Cross-Platform Consistency
Microarray is a powerful technique used extensively for gene expression analysis. Different technologies are available, but lack of standardization makes it challenging to compare and integrate data. Furthermore, batch-related biases within datasets are common but often not tackled. We have analyzed the same 234 breast cancers on two different microarray platforms. One dataset contained known b...
متن کاملDetecting, correcting, and preventing the batch effects in multi-site data, with a focus on gene expression Microarrays
Gene expression microarrays are widely used to better understand the complex biological mechanisms inside cells. One of the main obstacles of applying statistical learning algorithms to microarray data is the large gap between the number of features (p) and the number of available instances (n), i.e., the “large p, small n” challenge. This thesis explores two ways to deal with this challenge. O...
متن کاملsvaseq: removing batch effects and other unwanted noise from sequencing data
It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analys...
متن کاملA Novel Method for Automated Estimation of Effective Parameters of Complex Auditory Brainstem Response: Adaptive Processing based on Correntropy Concept
Objectives: Automated Auditory Brainstem Responses (ABR) peak detection is a novel technique to facilitate the measurement of neural synchrony along the auditory pathway through the brainstem. Analyzing the location of the peaks in these signals and the time interval between them may be utilized either for analyzing the hearing process or detecting peripheral and central lesions in the human he...
متن کاملProviding a Model for Detecting Tax Fraud Based on the Personality Types of Corporate Financial Managers using the Neural Network Approach
One of the management measures to reduce tax liabilities is non-payment of taxes through tax fraud. Because personality factors may play a role in explaining tax ethics, examining personality traits and aspects of tax fraud can help to better understand the factors that influence tax decisions. The main purpose of this study is to provide a model for detecting tax fraud based on the personality...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 34 7 شماره
صفحات -
تاریخ انتشار 2018